The suite is composed of various checks such as: Under Annotated Property Segments, Under Annotated Meta Data Segments, Property Label Correlation, etc...
Each check may contain conditions (which will result in pass ✓ /
fail ✖ / warning ! / error ⁈) as well as
other outputs such as plots or tables.
Suites, checks and conditions can all be modified. Read more about
custom suites.
| Status | Check | Condition | More Info |
|---|---|---|---|
✖ |
Conflicting Labels - Train Dataset | Ambiguous sample ratio is less or equal to 0% | Ratio of samples with conflicting labels: 0.21% |
✓ |
Special Characters - Train Dataset | Ratio of samples containing more than 20% special characters is below 5% | Found 1 samples with special char ratio above threshold |
✓ |
Text Duplicates - Test Dataset | Duplicate data ratio is less or equal to 5% | Found 1.36% duplicate data |
✓ |
Text Duplicates - Train Dataset | Duplicate data ratio is less or equal to 5% | Found 2.58% duplicate data |
✓ |
Special Characters - Test Dataset | Ratio of samples containing more than 20% special characters is below 5% | Found 0 samples with special char ratio above threshold |
✓ |
Conflicting Labels - Test Dataset | Ambiguous sample ratio is less or equal to 0% | Ratio of samples with conflicting labels: 0% |
✓ |
Unknown Tokens - Train Dataset | Ratio of unknown words is less than 0% | Ratio was 0% |
✓ |
Unknown Tokens - Test Dataset | Ratio of unknown words is less than 0% | Ratio was 0% |
✓ |
Frequent Substrings - Train Dataset | No more than 1 substrings with ratio above 0.05 | Found 0 substrings with ratio above threshold |
✓ |
Frequent Substrings - Test Dataset | No more than 1 substrings with ratio above 0.05 | Found 0 substrings with ratio above threshold |
Find identical samples which have different labels. Read More...
| Status | Condition | More Info |
|---|---|---|
✖ |
Ambiguous sample ratio is less or equal to 0% | Ratio of samples with conflicting labels: 0.21% |
| Text | ||
|---|---|---|
| Observed Labels | Sample IDs | |
| 1, 0, 0, 0, 0 | 249, 571, 1729, 2243, 2370 | يسقط الانقلاب |
Checks for duplicate samples in the dataset. Read More...
| Status | Condition | More Info |
|---|---|---|
✓ |
Duplicate data ratio is less or equal to 5% | Found 2.58% duplicate data |
| Text | Sample IDs | Number of Samples |
|---|---|---|
| كلنا قيس السعيد | 578, 1468, 1755 | 3 |
| قيسون ولد الشعب امعاه ربي والش... | 1107, 1960 | 2 |
| فاسد يريد ان يكون رئيس دوله | 1005, 1227 | 2 |
| يحيا قيسون | 1058, 1417 | 2 |
| الزقفونه بواس لكتاف انفضح | 712, 2129 | 2 |
Checks for duplicate samples in the dataset. Read More...
| Status | Condition | More Info |
|---|---|---|
✓ |
Duplicate data ratio is less or equal to 5% | Found 1.36% duplicate data |
| Text | Sample IDs | Number of Samples |
|---|---|---|
| قيس سعيد | 132, 441 | 2 |
| يحيا قيس سعيد | 260, 453 | 2 |
| المرزوقي يشرف كل تونسي حر ويكف... | 285, 312 | 2 |
| كلنا قيس سعيد | 163, 238 | 2 |
| الناس الكل عندها الحق تتكلم ال... | 437, 439 | 2 |
Find samples that contain special characters and also the most common special characters in the dataset. Read More...
| Status | Condition | More Info |
|---|---|---|
✓ |
Ratio of samples containing more than 20% special characters is below 5% | Found 1 samples with special char ratio above threshold |
| Sample ID | % of Special Characters | Special Characters | Text |
|---|---|---|---|
| 346 | 0.29 | ['َ', 'ِ', 'ْ', 'ُ', 'ّ'] | الشَعْب يُرِيد اِسْقَاط وَ عَزْل المُسمَي وَ المَدْعُو قَيْس سَعِيد المُجْرِم الدِكْتَاتُور الاِنقلَ |
| 1672 | 0.06 | ['ُ', 'ٌ'] | تحيا تونس ورئيسها السيٌدقيُس سعيُد رئيس الجمهوريه |
| 2161 | 0.05 | ['َ'] | لم يغادوَروا الاخوان |
| 380 | 0.04 | ['ّ'] | عيّن الياس الفخفاخ و فشل عيّن هشام المشيشي و فشلعيّن بعد الانقلاب نجلاء بودن و فشلتعيّن احمد الحشاني |
| 902 | 0.03 | ['ْ'] | كلامه نضري لا يمت للواقع بشيْ |
Find samples that contain special characters and also the most common special characters in the dataset. Read More...
| Status | Condition | More Info |
|---|---|---|
✓ |
Ratio of samples containing more than 20% special characters is below 5% | Found 0 samples with special char ratio above threshold |
| Sample ID | % of Special Characters | Special Characters | Text |
|---|---|---|---|
| 431 | 0.02 | ['ٌ'] | انت بعيد صديقي ولا تعلم انه تسبٌب في ازمه تونس بتصرٌفاته الحمقاء لقد كان في فتره رئاسته مجرٌد طرطور |
| 399 | 0.02 | ['ّ'] | عجيبه تفسيرات قيس سعيّد لنتائج الانتخابات |
| 302 | 0.01 | ['ّ'] | النظام العلماني الشرس المدعوم من الاستعمار الفرنسي والامريكي لا زال يتحكّم في النظام التونسي المعادي |
| 19 | 0.00 | [] | تونس اصبحت مهزله |
| 0 | 0.00 | [] | رايته عند المومياء المحنط متع الحيمايا تعيس هذا الكف الاول من عند الشعب صفر منتخب والكف الثاني عند ر |
| Check | Reason |
|---|---|
| Text Property Outliers - Test Dataset | Functionality requires properties, but the the TextData object had none. To use this functionality, use the set_properties method to set your own properties with a pandas.DataFrame or use TextData.calculate_builtin_properties to add the default deepchecks properties. |
| Under Annotated Property Segments - Train Dataset | Functionality requires properties, but the the TextData object had none. To use this functionality, use the set_properties method to set your own properties with a pandas.DataFrame or use TextData.calculate_builtin_properties to add the default deepchecks properties. |
| Text Property Outliers - Train Dataset | Functionality requires properties, but the the TextData object had none. To use this functionality, use the set_properties method to set your own properties with a pandas.DataFrame or use TextData.calculate_builtin_properties to add the default deepchecks properties. |
| Property Label Correlation - Test Dataset | Functionality requires properties, but the the TextData object had none. To use this functionality, use the set_properties method to set your own properties with a pandas.DataFrame or use TextData.calculate_builtin_properties to add the default deepchecks properties. |
| Property Label Correlation - Train Dataset | Functionality requires properties, but the the TextData object had none. To use this functionality, use the set_properties method to set your own properties with a pandas.DataFrame or use TextData.calculate_builtin_properties to add the default deepchecks properties. |
| Under Annotated Meta Data Segments - Test Dataset | Functionality requires metadata, but the the TextData object had none. To use this functionality, use the set_metadata method to set your own metadata with a pandas.DataFrame. |
| Under Annotated Meta Data Segments - Train Dataset | Functionality requires metadata, but the the TextData object had none. To use this functionality, use the set_metadata method to set your own metadata with a pandas.DataFrame. |
| Under Annotated Property Segments - Test Dataset | Functionality requires properties, but the the TextData object had none. To use this functionality, use the set_properties method to set your own properties with a pandas.DataFrame or use TextData.calculate_builtin_properties to add the default deepchecks properties. |
| Frequent Substrings - Test Dataset | Nothing found |
| Unknown Tokens - Test Dataset | Nothing found |
| Conflicting Labels - Test Dataset | Nothing found |
| Frequent Substrings - Train Dataset | Nothing found |
| Unknown Tokens - Train Dataset | Nothing found |